Identification of promising high-affinity inhibitors of SARS-CoV-2 main protease from African Natural Products Databases by Virtual Screening

With the rapid spread of the new severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the pathogen agent of COVID-19 pandemic created a serious threat to global public health, requiring the most urgent research for potential therapeutic agents. The availability of genomic data of SARS-CoV-2 and efforts to determine the protein structure of the virus facilitated the identification of potent inhibitors by using structure-based approach and bioinformatics tools. Many pharmaceuticals have been proposed for the treatment of COVID-19, although their effectiveness has not been assessed yet. However, it is important to find out new-targeted drugs to overcome the resistance concern. Several viral proteins such as proteases, polymerases or structural proteins have been considered as potential therapeutic targets. But the virus target must be essential for host invasion match some drugability criterion. In this Work, we selected the highly validated pharmacological target main protease Mpro and we performed high throughput virtual screening of African Natural Products Databases such as NANPDB, EANPDB, AfroDb, and SANCDB to identify the most potent inhibitors with the best pharmacological properties. In total, 8753 natural compounds were virtually screened by AutoDock vina against the main protease of SARS-CoV-2. Two hundred and five (205) compounds showed high-affinity scores (less than − 10.0 Kcal/mol), while fifty-eight (58) filtered through Lipinski’s rules showed better affinity than known Mpro inhibitors (i.e., ABBV-744, Onalespib, Daunorubicin, Alpha-ketoamide, Perampanel, Carprefen, Celecoxib, Alprazolam, Trovafloxacin, Sarafloxacin, Ethyl biscoumacetate…). Those promising compounds could be considered for further investigations toward the developpement of SARS-CoV-2 drug development.

determination methods has facilitated the identi cation of potential therapeutic targets and inhibitors using bioinformatics tools.
Thus, many pharmaceuticals have been proposed for the treatment of COVID-19, although their e ciency has not been evaluated yet [4], [5], [6]. Several viral proteins such as proteases, polymerases or structural proteins have been considered as potential therapeutic targets [7].
Natural products (NPs) are broadly de ned as chemical substances produced by living organisms. More precised de nitions of NPs exist, but they still do not make consensus. Some include all small molecules resulting from metabolic reactions, others classify as "NPs" only the secondary or non-essential metabolism products [8], [9]. In this document, the term Natural Products will designate extracted active compounds from Plants. They have bene ted mankind in food, pesticides, cosmetic products, and especially in drugs [10], [8]. The crude extracts from Plants used in traditional medicine contain many pharmacological active compounds [11], [12]. They have shown their healing power in reducing diseases since ancient civilization [13]. These crude medicines can lead to the discovery of other active molecules and eventually to the development of chemical pure drugs that have real bene cial effects. Currently, many prescribed medicines are derived from investigations on natural products. The oldest examples of drugs based on natural products are analgesics, for which willow bark was used to relieve pain, due to salicin, a natural product hydrolyzed into salicylic acid; acetylsalicylic acid, better known as aspirin, is a synthetic derivate used as an analgesic [13]. Some of the notable approved drugs, either from pure or derived NPs, include lefamulin, the aminoglycoside antibiotic plazomicin; tafenoquine succinate, an antimalarial agent; and aplidine, an anticancer agent [14]. These ndings are positive proofs that natural products can be used to nd e cient drugs.
After the rst SARS-CoV epidemic in the early 2000s, the main protease (M pro , nsp5) also named chymotrypsin-like protease (3CL pro ) [15], has been the object of particular attention. Many studies have demonstrated that this protease is a precious therapeutic target due to its essential role in viral replication [16], [17]. Other important coronaviral therapeutic targets include spike protein (S), RNA-dependent RNA polymerase (RdRp, nsp12), NTPase/helicase (nsp13), and papain-like protease (PL pro , part of nsp3) [18], [19]. M pro plays a central role in mediating viral replication and transcription functions through extensive proteolytic processing of two replication polyproteins, pp1a (486 kDa) and pp1ab (790 kDa) [20]. It exists only in viruses and is not present in humans. Interestingly, it is the most conserved enzyme among SARS-CoV-2 related viruses [21]. The sequence of the M pro enzyme shows high identity (> 96%) with the SARS-CoV, except for one key residue (Ala285Thr), which may contribute to the high infectivity of the SARS-CoV-2 virus [22], [23]. The functional centrality of M pro in the viral life cycle makes it an interesting target for drug development against SARS and other CoV infections. Therefore, its inhibition may block the production of infectious virus particles and thus alleviate disease symptoms [16], [24]. By capitalizing on this knowledge, M pro is one of the most attractive viral targets for antiviral drug discovery against SARS.
That is why, in this study, we carried out investigations of potential inhibitors of the main protease [25] of SARS-CoV-2 by high-throughput virtual screening of African natural products Databases.

Screening data:
In this study, we used a dataset of compounds from databases of African natural products: AfroDb [26], EANPDB [9], NANPDB [27] and SANCDB [10], [28]. Because these databases did not have the same formats, they were prepared differently. The molecules of the AfroDb and EANPDB databases were prepared using PyRx -Python Prescription 0.8 while those of SANCDB and NANPDB were prepared by Open Babel [29] using homemade scripts. Then all compounds in SDF (Standard Delay Format) format were converted to the pdbqt format. The compounds (Alpha-ketoamide 13b, Daunorubicin, Onalespib, and ABBV-744), have been prepared using AutoDockTools to be used as controls.

Protein preparation:
The main protease (Code PBD: 6Y2F), in complex with alpha-ketoamide 13b (αk-13b) was used as the receptor le. The coordinate le was loaded into PyMol to delete the ligand [6], [30]. Then the protein structure without the ligand was prepared with AutoDockTool-1.5.7 (ADT) [31]. ADT removed non-polar Hydrogens, added the Gasteiger charges, and assigned Solvation parameters and Atom Types. The Computed Atlas of Surface Topography of protein (CASTp) server is an online server that locates and measures pockets and voids on 3D protein structures [32], [33]. It was used to determine the ligandbinding pocket size. Based on the pocket size and taking into account of the alpha-ketoamide 13b position in the crystal structure, the grid box coordinates were set: center_x = -0.000, center_y = -0.704, center_z = -0.000 and size_x = 40, size_y = 40, size_z = 40.

Virtual screening:
Virtual screening is a widely used technique for identifying the top compounds against speci c proteins from a library of thousands of compounds [34]. The sum of 8741 molecules was virtually screened with the main protease M pro using AutoDock Vina 1.1.2 [35] in Command Line. After the preparation of the protein and ligands in pdbqt format, all the les were put in the same folder, with a con guration le containing the receptor name, the grid box coordinates, the list of ligands. The Vina script was launched with default parameters.

Lipinski rules of ve veri cations:
Lipinski's rule helps to distinguish drug-like from non-drug-like molecules [36]. It states that a drug-like molecule must have at least two of the following rules: Molecular weight less than 500 Dalton, high lipophilicity (expressed as LogP less than 5), less than 5 hydrogen bond donors; less than 10 hydrogen bond acceptors; molar refractivity must be between 40 and 130 [37], [36]. These parameters have been veri ed using the SWISS ADMET server (http://www.swissadme.ch/) [38] .

Protein-ligand interaction determination:
The interactions between small molecules with the highest a nity scores with M pro was determined using PyMol-2.0 [39]

Natural Compounds from African Databases:
AfroDb is a collection of natural products from African medicinal plants with known bioactivities [26]. It represents the largest diversi ed collection of 3D structures of natural products covering the entire African continent. These structures can be easily downloaded and used in virtual screening studies (http://african-compounds.org/about/afrodb/). The compounds with a large number of tested biological activities are included in the ZINC database (http://zinc.docking.org/catalogs/afronp/). The South African Natural Compounds Database (SANCDB) is a fully referenced database of natural compounds from sources in South Africa ( https://sancdb.rubi.ru.ac.za/ ) [10]. The Northern African Natural Products Database (NANPDB) is the largest collection of natural compounds produced by indigenous organisms of North Africa (http://african-compounds.org/nanpdb/) [41]. The Eastern Africa Natural Products Database (http://african-compounds.org) containing the structural and bioactivity information of 1870 unique isolated molecules from about 300 source species from the Eastern African region [9]. In total, 8741 natural compounds were obtained from these different databases (Table 1). 3.2 SARS-CoV2 active site: The structure of SARS-CoV-2 (code PDB: 6Y2F) was used for the virtual screening. The binding pocket determined by the CASTp server was presented in Fig. 1.
The structure was constituted of two protomers (A and B) and a catalytic dyad (His41-Cys145) per protomer, very similar to that of the SARS protease [42]. The enzyme was composed of three domains: the domain I (residues 1-101), domain II (residues 102-184) are mainly made of antiparallel β-sheets, and an α-helical domain III (residues 201-301) [43], [44]. The catalytic domain III contains the Ser284-Thr285-Ile286 segment, an additional domain far from the catalytic dyad. One major difference with SARS-CoV-I is the substitution of the Thr285 residue by Ala [23].
The Fig. 1 left side showed the binding pocket of M pro represented here by the red surface. The molecular surface area of the pocket is 1738.8 Å 3 . This pocket was located between the two protomers. It was used to de ne a grid box covering the amino acids of the binding site by AutoDockTools (Fig. 1, Right). The grid box volume was made large enough to allow a number of Natural Compounds to dock with the protein.
Indeed, a molecule able to strongly interact between the two protomers could inactivate the enzyme activity, even preventing the protein dimerization [45].

Identi cation of promising inhibitors from the virtual screening:
In total, 8741 molecules were screened against the SARS-CoV-2 M pro among which two hundred and ve (205) molecules have presented a nity scores that varied between − 12.1 Kcal/mol and − 10.0 Kcal/mol.
Among them, fty-eight (58) were passed through the Lipinski's rules [37] and twelve (12) did not present any violation of the rule. Those molecules got a nity scores ranging from − 11.2 Kcal/mol and − 10.0 Kcal/mol. Those molecules were considered as promising inhibitors of SARS-CoV-2 ( Table 2). The molecules ABBV-744, Daunorubicin and Onalespib described as potent inhibitors of M pro [6] were also screened by using the same docking parameters as controls ( Table 2).

SARS-CoV-2 interactions with identi ed potent inhibitors
For better understanding of the interaction's details of M pro with the molecules in Table 2, Ligplot and PyMol were used to show interactions ( Table 3). The inhibitors showed speci c interactions with key residues of M pro . Table 3 Interactions between with M pro and the thirteen potent inhibitors. Olibanumol H Arg-4, Gly-138 Lys-5, 2Phe-3

Molecules
Clionamine D Phe-3, Leu-282 Phe-3, Leu-282 The Table 3 presented the details of the interactions between M pro and the twelve identi ed compounds. The analysis of the interactions showed that the inhibitors interact mostly with residues Arg4, Leu282, Gly283, Glu288.
The Fig. 2 showed the interactions between M pro and the top four inhibitors based on their a nity scores, docked poses and interactions with the protein key amino acids. The details of the other compounds are presented on supplementary data (Fig. 3).
The LigPlot + program was used to map the 2D interactions between M pro and the top four identi ed compounds (supplementary data; Fig. 4).
The analysis of the interaction's details was summarized in Table 4. Table 4 Summary of interactions between M pro and the top four compounds: The Table 4 revealed that the main residues of M pro interacting with the ligands were Arg 4, Lys 5 Glu 283, Gly 283 and Glu 288 involving the catalytic domain surrounding the active site dyad (His41-Cys145).

Discussion
The discovery of new drugs capable of inhibiting the infection caused by SARS-CoV-2 is a global priority in order to put a de nitive end to this health emergency. Many studies have demonstrated that the M pro protein was an attractive target against SARS-CoV-2 because of its important role in virus replication, its conservation among other related viruses, and its cleavage speci city different from that of Human proteases [20], [21]. Many crystallographic structures were available for this interesting target which made it very suitable for structure-based drug design [47], [17]. Africa being a rich continent in plant diversity, the rst recourse for care is constituted of natural products that are easy to access and less expensive [8]. African natural products have demonstrated antiviral, antifungal and antibacterial properties [46]. These molecules are suitable candidates for high throughput virtual screening against validated drug targets.
The analysis of their interactions with M pro revealed that they bind mainly to protein residues Arg 4, Lys 5 Glu 283, Gly 283 and Glu 288 involving the catalytic domain surrounding the active site dyad (His41-Cys145). Hence the selected compounds could actually prevent the activity of the enzyme. It would be interesting to pursue experimental assays with those promising inhibitors and the SARS-Cov-2 Protein.
Many of these molecules have already shown e ciency in other studies : Sphaeropsidine A, against drugresistant cancer cells [49]; Gypsogenic acid against Bacillus subtilis and Bacillus thrungiensis also as potential antitumor agents [50]; Yardenone against hypoxia-inducing factor 1 (HIF-1) activation in breast and prostate tumor cells [51]; Epigallocatechin extracted from Acacia karroo, was used medicinally to treat diarrhea, colds, dysentery, conjunctivitis and hemorrhages. Acacia karroo and other local plant species such as Artemisia afra, Ziziphus mucronata and Eucomis autumnalis, have been widely used for the treatment of symptoms related to listeriosis [52]; A-homo-3a-oxa-5beta-olean-12-en-3-one-28-oic Acid, extracted from Albizia gummifera, was used in the indigenous medical system for various nutrients [53].
This study's ndings imply that the inhibitors identi ed were promising inhibitors of SARS-Cov-2 main protease M pro , interesting for the development of e cient drugs against SARS-Cov-2. Further investigations are needed to deepen these ndings in the process of drug development.

Conclusion
In this study, we identi ed twelve (12) compounds (Sphaeropsidin A, Gypsogenic acid, Yardenone, Ahomo-3a-oxa-5beta-olean-12-en-3-one-28-oic acid, Epigallocatechin, Neopellitorine B, Caretroside A, Pallidol, Maslinic acid, Cabralealactone, Tribulus saponin aglycone 3, Olibanumol H, Clionamine D) from African Natural Product which showed high-a nity and validated interesting pharmacological properties against the main protease M pro , the most attractive target of SARS-CoV-2. Further investigations have to be pursued with the identi ed compounds to foster the way into the development of new drugs against the COVID-19 virus.